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Video encoding method and corresponding computer programme 



The present invention generally relates to the field of data compression and 
more specifically, to a method of encoding a sequence of frames, composed of picture 
elements (pixels), by means of a three-dimensional (3D) subband decomposition involving a 
filtering step applied, in the sequence considered as a 3D volume, to the spatial-temporal data 
5 winch correspond in said sequence to each one of successive groups of frames (GOFs), these 
GOFs being themselves subdivided into successive pairs of frames (POFs) including a so- 
called previous frame and a so-called current frame, said decomposition being applied to said 
GOFs together with motion estimation and compensation steps performed in each GOF on 
saids POFs and on corresponding pairs of low-frequency temporal subbands (POSs) obtained 
1 0 at each temporal decomposition level. 

The invention also relates to a computer programme comprising a set of 
instructions for the implementation of said encoding method, when said programme is carried 
out by a processor included in an encoding device. 



In recent years, three-dimensional (3D) subband analysis, based on a 3D or 
(2D-H), wavelet decomposition of a sequence of frames considered as a 3D volume has been 
more and more studied for video compression. The wavelet transform generates coefficients 
that constitute a hierarchical pyramid in which the spatio-temporal relationship is defined 
thanks to 3D orientation trees evidencing the parent-offspring dependencies between said 
coefficients. The in-depth scanning of the generated coefficients in the hierarchical trees and 
a progressive bitplane encoding technique then lead to a desired quality scalability. 

A practical solution for implementing this approach is to generate motion 
compensated temporal subbands using a simple two taps wavelet filter, as illustrated in Fig 1 
for a GOF of eight frames, m the illustrated implementation, the input video sequence is 
divided into Groups of Frames (GOFs), and each GOF, itself subdivided into successive 
couples of frames (that are as many inputs for a so-called Motion-Compensated Temporal 
Filtering, or MCTF module), is first motion-compensated (MC) and then temporally filtered 
(TF). The resulting low frequency (L) temporal subbands of the first temporal decomposition 
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level are further filtered (TF), and the process may stop after an arbitrary number of 
decompositions resulting in one or more low frequency subbands called root temporal 
subbands (in the illustration, a non-hmitative example with two decomposition levels 
resulhng in two root subbands LL is presented). In the example of Fig. 1, the frames of the 
5 illustrated group are referenced Fl to F8, and the dotted arrows correspond to a high-pass 
temporal filtering, while the other ones correspond to a low-pass temporal filtering Two 
stages of decomposition are shown (L and H = first stage ; LL and LH = second stage) At 
each temporal decomposition level of the illustrated group of 8 frames, a group of motion 
vector fields is generated (in the present example, MV4 at the first level and MVS at the 
10 second one). 

When a Haar multiresolution analysis is used for the temporal decomposition 
since one motion vector field is generated between every two frames in the considered group 
of frames at each temporal decomposition level, the number of motion vector fields is equal 
to half the number of frames in the temporal subband, i.e. four at the first level of motion 

1 5 vector fields and two at the second one. Motion estimation (ME) and motion compensation 
(MC) are only performed every two frames of the input sequence (generally in the forward 
way), due to the temporal down-sampling by two of the simple wavelet filter. Using these 
very simple filters, each low frequency temporal subband (L) represents a temporal average 
of the mput couples of frames, whereas the high frequency one (H) contains the residual error 

20 after the MCTF step. 

Unfortunately, the motion compensated temporal filtering may raise the 
problem of unconnected pixels, which are not filtered at all (or also the problem of double- 
conneoted pixels, which are filtered twice). The number of unconnected pixels represents a 
weakness of a 3D subband codec approaches because it highly impacts the resulting picture 
25 quality, particularly in occlusion regions. It is especially true for high motion sequences or 
for final temporal decomposition levels, where the temporal correlation is not good The 
number of these unconnected pixels depends on the dense motion vector field that has been 
generated by the motion estimation. 

Current criteria for optimal motion vector search used in motion estimators do 
30 not take into account the number of unconnected pixels that will be the result of motion 
compensation. Most sophisticated algorithms use a rate/distortion criterion which tends to 
minimize a cost function that depends on the displaced difference energy (distortion) and the 
number of bits spent to transmit the motion vector (rate). For example, the motion search 
returns the motion vector that minimizes: 
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Am) = SAD(s,c(m)) + A MOTION .*( m _ p) 

m this expression (1), ro = (m x ,m y f is the motion vector, p = (Pi , Py f is ^ 
prediction for the motion vector, and X MQTlotl is the Lagrange multiplier. The rate term 

*(m-p) represents me motion information only and SAD , used as distortion measure, is 
computed as : 

SADhcfr))- f l \s[x,y}-c[x->n I .y-m y ]\ 

Jt=1,.y=1 yj i (2) 

with s being the original video signal, c being the coded video signal and B being the block 
sue (note that B can be 1). Unfortunately, these algorithms do not take into account the 
distortion introduced by unconnected pixels during the inverse motion compensation because 
usually these optimizations are applied to hybrid coding for which the inverse motion 
compensation is not performed. 
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It is therefore an object of the invention to avoid such a drawback and to 
propose a video encoding method in which the set of unconnected pixels is taken into 
account in the distortion measure. 

To this end, the invention relates to a method such as defined in the 
introductory paragraph and which is moreover characterized in that, said process of motion 
compensated temporal filtering leading in the previous frames on the one hand to connected 
Pixels, that are filtered along a motion trajectory corresponding to motion vectors defined by 
means of said motion estimation steps, and on the other hand to a residual number of so- 
called unconnected pixels, that are not filtered at all, each motion estimation step comprises a 
mohon search provided for returning a motion vector that minimizes a cost function 
depending at least on a distorsion criterion involving a distortion measure, said measure 
distorsion being also applied to the set of said unconnected pixels. 



The present invention will now be described, by way of example, with 
reference to the accompanying drawing in which Fig. 1 shows a temporal multiresolution 
analysis with motion compensation. 
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Because unconnected pixels higbly participate to the quality degradation of the 
inverse motion compensated image, the set of unconnected pixels is, according to the 
invention, taken into account in the distortion measure. To this end, it is here proposed to 
introduce a new rate/distortion criterion that extends equation taking into account the 
5 unconnected pixels phenomenon. This is illustrated in equations (3) and (4), that are 
equivalent: 

K{m) = J(m) + ^-UNCONNECTED ■ D{S -UNCONNECTED (m)) (3) 

AT(m) - SAD{s,c{m)) + X UNCONNECTW . D{s UNCOmECJm (m))+ A W77CW • R{ m - p ) (4) 
with D{S unconnected ( m )) being the distortion measure for the set S ~f 

UNCONNECTED OI 

unconnected pixels resulting from motion vector m . Several distortion measures can be 
apphed to the set of unconnected pixels. A very simple measure is preferably the count of 
10 unconnected pixels for the motion vector under study. 

It can be noted that the real set of unconnected pixels resulting from a motion 
search can be computed only when the motion vector, information is available for the whole 
frame. Therefore, an optimal solution can hardly be achievable (in fact a complex set of 
minimisation criteria for the whole frame should be solved), and a sub-optimal 
implementation is therefore proposed. This implementation, not recursive, can be considered 
as a sample way to take into account the distortion due to unconnected pixels. For a given part 
of the image to be motion compensated (a part of the image can be a pixel, a block of pixels 
a macroblock of pixels or any region provided that the set of parts covers the whole image ' 
without any overlapping) and for a given motion vector candidate m , a temporary inverse 
motion compensation is apphed, the set of unconnected pixels is identified, and 
D (SuNcoNNEcm>(m)) can be evaluated. The current K(m) value can then be computed and 
compared to the current minimum value ( m) to check if the candidate motion vector 
bringsalower AT(m) value (for the first motion vector candidate, K (m) is obviously equal to 
the valeur *(m) computed). When all the candidate have been tested, the (final) inverse 
motion compensation is apphed to the best candidate (identifying connected and unconnected 
pixels). The next part of the image can then be processed, and so on up to a complete 
processing of the whole image. 

However, in this non-recursive implementation, the resulting decisions are not 
always spatially homogeneous over the whole image : for the first part of the image to be 
motion compensated, the set of unconnected pixels may be empty, while the probability of 
unconnected pixels for the last part of the image to be motion compensated is then very high 



• 



WO 2004/053798 

PCT/IB2003/005766 

5 

This situation can lead to heterogeneous spatial distorsions. In order to discard such a 
problem, resulting of the single-pass implementation, a multiple-pass implementation can be 
proposed, which indeed allows to improve said single-pass one by minimizing the global 
cntenon £ K(m) for all parts of the whole image, which can be done with a multiple-pass 
5 implementation including the following steps. 

First, for all the parts of the image, the optimal motion vector m opl is 
computed, as well as a set of N,^, sub-optimal motion vectors R^, } m Jp rovide ^ 
mxnimum values for J( m ) of equation (1), the number of unconnected pixels being not used 
at tins stage (the number of sub-optimal vectors N,^ is implementation dependent). For 
10 a »*ese vectors, the co^^^^ 

I . » opt / 

^ [J(m »*-°» » *° generated ™« «* inverse motion compensation is applied for the 
optimalmotionvectorsm opt sothat can be computed (note that T K(m ) 

al i parts * opt / 

al I ports 

is not the optimal value for £* {m) , because »„ is optimizing J (m ) and not *(m)). 
From the hst of sub-optimal vectors, the candidate motion vector m _ minimizing 
™ |kK pt )}-{y(m^ w .j}| is then selected (note that m candlilate can be a vector of any part of 
the current image). For the set of optimal motion vectors and the candidate vector (in place of 
the optnnal vector for the corresponding part of the image), an inverse motion compensation 
18 aPPh6d ^.Sf (m) " ^ COm P Uted If its ^ - lower than £* Kp( ), the 

al I parts 

optimal value of m opt is replaced by m ^ (for ^ eorresponding part of the image). 
20 Finally is di scarded from ^ , ist of sub . optimal vectors ^ & ^ ^ ^ 

selected and the same mechanism is applied until the list of sub-optimal vectors is empty 
order to obtain the optimal set of motion vectors. 
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CLAIMS: 
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• A method of encoding a sequence of frames, composed of picture elements 

^^Ata^pp,,^ decomposition involvingafilteriug 
step apphed, in the sequence considered as a 3D volume, to the spatial-temporal data which 
correspond in said sequence to each one of successive groups of frames (GOFs), these GOFs 
being themselves subdivided into successive pairs of frames (POFs) including a so-called 
previous frame and a so-called current frame, said decomposition being applied to said GOFs 
togetherw.thmotion estimation and compensation steps performed in each GOF on saids 
POFs and on corresponding pairs of low-frequency temporal subbands (POSs) obtained at 
each temporal decomposition level, this process of motion compensated temporal filtering 
leadmgmtheprevious frames on the one hand to connected pixels, that are filtered a,ong a 
motion trajectory corresponding to motion vectors defined by means of said motion 
esnma Uons ^ 

mat are not filtered at all, each motion estimation step comprising a motion search provided ' 
for retummg amotion vector ^t n^ izes a cost fr.cfion depending at least on a distorsion 
cntenon mvolvmg a distortion measure, said measure distorsion being also applied to the set 
of said unconnected pixels. 



20 



• An encodmg method according to claim 1, in which said motion search is 

proved for returning the motion vector that minimizes the following expression (1) • 

where m^m^f is the motion vector, P = (Px>Pyf is the prediction for the motion 
vector, X MOmN is the Lagrange multipher, the rate term *( m - p ) represents the motion 
information only, SAD used as distortion measure is computed as : 

SAD(s Am)) = f,\s[ X ,y}- c [ x - m ]\ 

y *\ (2) 

S is *e origin video signal , c is fte ^ ^ ^ ^ g ^ ^ ^ ^ 
charac^d in to*, di*^ ^ extends ^ (J) ^ ^ ^ 

™:T ,8phenome ^ 
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/r(m) = J(m) + ^"CONNECTED ■ ^W^W*)) (3) 

m which &(S mcomECTED ( m )) is the distortion measure for the set 9 * 

1 UN CONNECTED °* 

unconnected pixels resulting from the motion vector m . 

3. An encoding method according to claim 2, characterized in that it includes for 

~ely applied to each part of the whole lra a g e to be motion-compensated ' 

W for the considered part of the image and for a given motion vector candidate 

m, a temporary mverse motion compensation is applied; 

0>) the set of unconnected pixels is identified; 

( c ) ^unconnected (m)) is evaluated; 

lc K (m) ^ " ***** "* * «" — — « 

vato S*W) .„ check fa. motion veMor ^ , ^ ^ ^ 

applied to the best candidate; 
» (f) the steps (a) to (e) are then applied to the next part of the image that can be 

20 t \. • ^ enC ° ding meth ° d aCCOTdin g to claim 2, characterized in that it includes for 
talangmtoaccount the distortion due to the unconnected pixels and nunimizmg me 
cntenon £ [all parts]^) for the whole nnage to be compensated, the blowing steps: 

W ae0ptoa,motionvector ^-computed,a S wellasa S etofN sub ^ t sub- 

(b) for all these vectors, the corresponding value for the criterion J(m) is stored, in 

order to generate J^) and {J(m subK)pt} ; ^ m 

(0 an inverse motion compensation is applied for the optimal motion vectors m,, 

in order to compute £ [all parts] K^); 

30 rnmm^ g ^ m ^ (m ^ ))lisselected; 
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for the set of optimal motion vectors and the candidate vector, an inverse 
mofcon compensation is applied, in order to compute again£ [all parts] K(m); 

® ifthevalueof^ [all parts] K(m) is lower than 2 [all parts] K^X the 

op^alvalueo^isreplacedbym^ 



— A — — -» 

anally, nWdme is discarded from the list of sub-optimal vectors; 

until the 



00 anewcandidate is selected, and the same mechanism is then applied unti 

hstofsuh-ophmal vectors is empty, in order to obtain the optimal set of motion vectors 



5 - A computer programme comprising a set of instructions for the 

^enta.onofamethodac.ordingto anyone of claims 3 and 4, when said programme ^ 
earned out by aprocessor included in an encoding device. 
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